An HMM-based speech-to-video synthesizer

نویسندگان

  • Jay J. Williams
  • Aggelos K. Katsaggelos
چکیده

Emerging broadband communication systems promise a future of multimedia telephony, e.g. the addition of visual information to telephone conversations. It is useful to consider the problem of generating the critical information useful for speechreading, based on existing narrowband communications systems used for speech. This paper focuses on the problem of synthesizing visual articulatory movements given the acoustic speech signal. In this application, the acoustic speech signal is analyzed and the corresponding articulatory movements are synthesized for speechreading. This paper describes a hidden Markov model (HMM)-based visual speech synthesizer. The key elements in the application of HMMs to this problem are the decomposition of the overall modeling task into key stages and the judicious determination of the observation vector's components for each stage. The main contribution of this paper is a novel correlation HMM model that is able to integrate independently trained acoustic and visual HMMs for speech-to-visual synthesis. This model allows increased flexibility in choosing model topologies for the acoustic and visual HMMs. Moreover the propose model reduces the amount of training data compared to early integration modeling techniques. Results from objective experiments analysis show that the propose approach can reduce time alignment errors by 37.4% compared to conventional temporal scaling method. Furthermore, subjective results indicated that the purpose model can increase speech understanding.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hidden Markov model based visual speech synthesizer

This paper describes a hidden Markov model (HMM) based visual synthesizer designed to assist persons with impairedhearing. This synthesizer builds on results in the area of audio-visual speech recognition. We describe how a correlation HMM can be used to integrate independent acoustic and visual HMMs for speech-to-visual synthesis. Our results show that an HMM correlating model can signi cantly...

متن کامل

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Towards a linear dynamical model based speech synthesizer

We present recent developments towards building a speech synthesis system completely based on Linear Dynamical Models (LDMs). Specifically, we describe a decision tree-based context clustering approach to LDM-based speech synthesis and an algorithm for parameter generation using global variance with LDMs. In order to capture the speech dynamics, LDMs need coarser phoneme segmentation than the 5...

متن کامل

The HMM synthesis algorithm of an embedded unified speech recognizer and synthesizer

In this paper we present an embedded unified speech recognizer and synthesizer using identical, speaker independent HiddenMarkov-Models. The system was prototypically realized on a signal processor extended by a field programmable gate array. In a first section wewill give a brief overview of the system. The main part of the paper deals with a specially designed unit based HMM synthesis algorit...

متن کامل

HMM based myanmar text to speech system

This paper presents a complete statistical speech synthesizer for Myanmar which includes a syllable segmenter, text normalizer, grapheme-to-phoneme convertor, and an HMM-based speech synthesis engine. We believe this is the first such system for the Myanmar language. We performed a thorough human evaluation of the synthesizer relative to human and re-synthesized baselines. Our results show that...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE transactions on neural networks

دوره 13 4  شماره 

صفحات  -

تاریخ انتشار 2002